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Intuitively, if a density operator has small rank, then it should be easier to estimate from ex- 
perimental data, since in this case only a few eigenvectors need to be learned. We prove two 
complementary results that confirm this intuition. First, we show that a low-rank density matrix 
can be estimated using fewer copies of the state, i.e., the sample complexity of tomography decreases 
with the rank. Second, we show that unknown low-rank states can be reconstructed from an in- 
complete set of measurements, using techniques from compressed sensing and matrix completion. 
These techniques use simple Pauli measurements, and their output can be certified without making 
any assumptions about the unknown state. 

We give a new theoretical analysis of compressed tomography, based on the restricted isometry 
property (RIP) for low-rank matrices. Using these tools, we obtain near-optimal error bounds, for 
the realistic situation where the data contains noise due to finite statistics, and the density matrix 
is full-rank with decaying eigenvalues. We also obtain upper-bounds on the sample complexity 
of compressed tomography, and almost-matching lower bounds on the sample complexity of any 
procedure using adaptive sequences of Pauli measurements. 

Using numerical simulations, we compare the performance of two compressed sensing estimators — 
the matrix Dantzig selector and the matrix Lasso — with standard maximum-likelihood estimation 
(MLE). We find that, given comparable experimental resources, the compressed sensing estimators 
consistently produce higher-fidelity state reconstructions than MLE. In addition, the use of an 
incomplete set of measurements leads to faster classical processing with no loss of accuracy. 

Finally, we show how to certify the accuracy of a low rank estimate using direct fidelity estimation 
and we describe a method for compressed quantum process tomography that works for processes 
with small Kraus rank, and requires only Pauli eigenstate preparations and Pauli measurements. 



I. INTRODUCTION 

In recent years there has been amazing progress in 
studying complex quantum mechanical systems under 
controlled laboratory conditions [1]. Quantum tomog- 
raphy of states and processes in an invaluable tool used 
in many such experiments, because it enables a complete 
characterization of the state of a quantum system or pro- 
cess (see e.g. [2-16]). Unfortunately, tomography is very 
resource intensive, and scales exponentially with the size 
of the system. For example, a system of n spin-1/2 parti- 
cles (qubits) has a Hilbert space with dimension d = 2™, 
and the state of the system is described by a density ma- 
trix with d 2 — 4" entries. 

Recently a new approach to tomography was proposed: 
compressed quantum tomography, based on techniques 
from compressed sensing [17, 18]. The basic idea is to 
concentrate on states that are well approximated by den- 
sity matrices of rank r <^ d. This approach can be ap- 
plied to many realistic experimental situations, where the 
ideal state of the system is pure, and physical constraints 
(e.g., low temperature, or the locality of interactions) en- 
sure that the actual (noisy) state still has low entropy. 

This approach is convenient because it does not require 
detailed knowledge about the system. However, note that 
when such knowledge is available, one can use alternative 



formulations of compressed tomography, with different 
notions of sparsity, to further reduce the dimensionality 
of the problem [19]. We will compare these methods in 
Section VI B. 

The main challenge in compressed tomography is how 
to exploit this low-rank structure, when one does not 
know the subspace on which the state is supported. Con- 
sider the example of a pure quantum state. Since pure 
states are specified by only 0(d) numbers, it seems plau- 
sible that one could be reconstructed after measuring 
only O(d) observables, compared with 0(d 2 ) for a general 
mixed state. While this intuition is indeed correct [20- 
23], it is a challenge to devise a practical tomography 
scheme that takes advantage of this. In particular, one 
is restricted to those measurements that can be easily 
performed in the lab; furthermore, one then has to find 
a pure state consistent with measured data [24], prefer- 
ably by some procedure that is computationally efficient 
(note that finding minimum-rank solutions is NP-hard in 
general [25]). 

Compressed tomography provides a solution that 
meets all of these practical requirements [17, 18]. It 
requires measurements of Pauli observables, which are 
feasible in many experimental systems. In total, it uses 
a random subset of to = 0(rd\ogd) Pauli observables, 
which is just slightly more than the O(rd) degrees of 
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freedom that specify an arbitrary rank r state. Then 
the density matrix p is reconstructed by solving a con- 
vex program. This can be done efficiently using general- 
purpose SDP solvers [26], or specialized algorithms for 
larger instances [27-29]. The scheme is robust to noise 
and continues to perform well when the measurements 
are imprecise or when the state is only close to a low- 
rank state. 

Here we follow up on Refs. [17, 18] by giving a stronger 
(and completely different) theoretical analysis, which is 
based on the restricted isometry property (RIP) [30-32]. 
This answers a number of questions that could not be ad- 
dressed satisfactorily using the earlier techniques based 
on dual certificates. First, how large is the error in our 
estimated density matrix, when the true state is full-rank 
with decaying eigenvalues? We show that the error is not 
much larger than the "tail" of the eigenvalue spectrum 
of the true state. Second, how large is the sample com- 
plexity of compressed tomography, i.e., how many copies 
of the unknown state are needed, to estimate its density 
matrix? We show that compressed tomography achieves 
nearly the optimal sample complexity among all proce- 
dures using Pauli measurements, and, surprisingly, the 
sample complexity of compressed tomography is nearly 
independent of the number of measurement settings to, 
so long as to > 0(rc? poly log d). 

In addition, we use numerical simulations to investi- 
gate the question: given a fixed time T during which an 
experiment can be run, is it better to do compressed to- 
mography or full tomography, i.e., is it better to use a few 
measurement settings and repeat them many times, or 
do all possible measurements with fewer repetitions? For 
the situations we simulate, we find that compressed to- 
mography provides better accuracy at a reduced compu- 
tational cost compared to standard maximum-likelihood 
estimation. 

Finally, we provide two useful tools: a procedure for 
certifying the accuracy of low-rank state estimates, and 
a very simple compressed sensing technique for quantum 
process tomography. 

We now describe these results in more detail. 

Theoretical analysis using RIP. We use a fundamen- 
tal geometric fact: the manifold of rank-r matrices in 
C dxd can be embedded into 0{rd poly log d) dimensions, 
with small distortion in the 2-norm. An embedding that 
does this is said to satisfy the restricted isometry prop- 
erty (RIP) [32]. In [30] it was shown that such an em- 
bedding can be realized using the expectation values of 
a random subset of 0(rd poly log d) Pauli matrices. This 
implies the existence of so-called "universal" methods 
for low-rank matrix recovery: there exists a fixed set of 
0(rd poly log d) Pauli measurements, that has the ability 
to reconstruct every rank-r dx d matrix. Moreover, with 
high probability, a random choice of Pauli measurements 
will achieve this. (The earlier results of [17] placed the 
quantifiers in the opposite order: for every rank-r d x d 
matrix p, most sets of 0(rd poly log d) Pauli measure- 
ments can reconstruct that particular matrix p.) 



Intuitively, the RIP says that a set of random Pauli 
measurements is sensitive to all low-rank errors simulta- 
neously. This is important, because it implies stronger 
error bounds for low-rank matrix recovery [31]. These 
bounds show that, when the unknown matrix p is full- 
rank, our method returns a (certifiable) rank-r approxi- 
mation of p, that is almost as good as the best such ap- 
proximation (corresponding to the truncated eigenvalue 
decomposition of p) . 

In Ref. [31], these error bounds were used to show the 
accuracy of certain compressed sensing estimators, for 
measurements with additive Gaussian noise. Here, we 
use them to upper-bound the sample complexity of our 
compressed tomography scheme. (That is, we bound the 
errors due to estimating each Pauli expectation value 
from a finite number of experiments.) Roughly speak- 
ing, we show that our scheme uses 0(r 2 d 2 log d) copies 
to characterize a rank-r state (up to constant error in 
trace norm). When r = d, this agrees with the sam- 
ple complexity of full tomography. Our proof assumes 
a binomial noise model, but minor modifications could 
extend this result to other relevant noise models, such as 
multinomial, Gaussian, or Poissonian noise. 

Furthermore, we show an information-theoretic lower 
bound for tomography of rank-r states using adaptive 
sequences of single-copy Pauli measurements: at least 
Q,(r 2 d 2 1 dog d) copies are needed to obtain an estimate 
with constant accuracy in the trace distance. This gen- 
eralizes a result from Ref. [33] for pure states. Therefore, 
our upper bound on the sample complexity of compressed 
tomography is nearly tight, and compressed tomography 
nearly achieves the optimal sample complexity among all 
possible methods using Pauli measurements. 

Our observation that incomplete sets of observables are 
often sufficient to unambiguously specify a state gives 
rise to a new degree of freedom when designing exper- 
iments: when aiming to reduce statistical noise in the 
reconstruction, one can either estimate a small set of ob- 
servables relatively accurately, or else a large (e.g. com- 
plete) set of observables relatively coarsely. Our bounds 
(as well as our numerics) show that, remarkably, over 
a very large range of to the only quantity relevant for 
the reconstruction error is t, the total number of exper- 
iments performed. It does not matter over how many 
observables the repetitions are distributed. Thus, when 
fixing t and varying to, the reduction in the number of 
observables and the increase in the number of measure- 
ments per observable have no net effect with regard to the 
fidelity of the estimate, so long as to > fi(rd poly log d). 

Certification. We generalize the technique of direct fi- 
delity estimation (DFE) [33, 34] to work with low-rank 
states. Thus, one can use compressed tomography to get 
an estimated density matrix p, and use DFE to check 
whether p agrees with the true state p. This check is 
guaranteed to be sound, even if the true state p is not 
approximately low rank. Our extension of DFE may be 
of more general interest, since it can be used to efficiently 
certify any estimate p regardless of whether it was ob- 
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tained using compressed sensing or not, as long as the 
rank r of the estimate is small (and regardless of the 
"true" rank). 

Numerical simulations. We compare the performance 
of several different estimators (methods for reconstruct- 
ing the unknown density matrix). They include: con- 
strained trace-minimization (a.k.a. the matrix Dantzig 
selector), least squares with trace-norm regularization 
(a.k.a. the matrix Lasso), as well as a standard maximum 
likelihood estimation (MLE) [35-37] for comparison. 

We observe that our estimators outperform MLE in 
essentially all aspects, with the matrix Lasso giving the 
best results. The fidelity of the estimate is consis- 
tently higher using the compressed tomography estima- 
tors. Also, the accuracy of the compressed sensing esti- 
mates are (as mentioned above) fairly insensitive to the 
number of measurement settings m (assuming the total 
time available to run the experiment is fixed). So by 
choosing m <C d 2 , one still obtains accurate estimates, 
but with much faster classical post-processing, since the 
size of the data set scales like 0(m) rather than 0(d 2 ). 

It may be surprising to the reader that we outperform 
MLE, since it is often remarked (somewhat vaguely) that 
"MLE is optimal." However, MLE is a general-purpose 
method that does not specifically exploit the fact that 
the state is low-rank. Also, the optimality results for 
MLE only hold asymptotically and for informationally 
complete measurements [38, 39]; for finite data [40] or for 
incomplete measurements, MLE can be far from optimal. 

From these results, one can extract some lessons about 
how to use compressed tomography. Compressed tomog- 
raphy involves two separate ideas: (1) measuring an in- 
complete set of observables (i.e., choosing m -C d 2 ) 1 and 
(2) using trace minimization or regularization to recon- 
struct low-rank solutions. Usually one does both of these 
things. Now, suppose the goal is to reconstruct a low- 
rank state using as few samples as possible. Our results 
show that one can achieve this goal by doing (2) without 
(1). At the same time, there is no penalty in the quality 
of the estimate when doing (1), and there are practical 
reasons for doing it, such as reducing the size of the data 
set to speed up the classical post-processing. 

Quantum process tomography. Finally, we adapt our 
method to perform tomography of processes with small 
Kraus rank. Our method is easy to implement, since it 
requires only the ability to prepare eigenstates of Pauli 
operators and measure Pauli observables. In particular, 
we require no entangling gates or ancillary systems for 
the procedure. In contrast to Ref. [19], our method is 
not restricted to processes that are element-wise sparse 
in some known basis, as discussed in Section VI B. This 
is an important advantage in practice, because while the 
ideal (or intended) evolution of a system may be sparse 
in a known basis, it is often the case that the noise pro- 
cesses perturbing the ideal component are not sparse, and 
knowledge of these noise processes is key to improving the 
fidelity of a quantum device with some theoretical ideal. 



A. Related Work 

While initial work on tomography considered only lin- 
ear inversion methods [41], most subsequent work has 
largely focused on maximum likelihood methods and to 
a lesser extent Bayesian methods for state reconstruc- 
tion [35-37, 41-52]. 

However, recently there has been a flurry of work which 
seeks to transcend the standard MLE methods and im- 
prove on them in various ways. Our contributions can 
also be seen in this context. 

One way in which alternatives to MLE are being pur- 
sued is through what we call full rank methods. Here the 
idea is somewhat antithetical to ours: the goal is to out- 
put a full rank density operator, rather than a rank defi- 
cient one. This is desirable in a context where one cannot 
make the approximation that rare events will never hap- 
pen. Blume-Kohout's hedged MLE [53] and Bayesian 
mean estimation [52] are good examples of this type of 
estimator, as are the minimax estimator of Ref. [54] and 
the so-called Max-Ent estimators [55-58]. The latter are 
specifically for the setting where the measurement data 
are not informationally complete, and one tries to mini- 
mize the bias of the estimate by maximizing the entropy 
along the directions where one has no knowledge. 

By contrast, our low rank methods do not attempt 
to reconstruct the complete density matrix, but only a 
rank-r approximation, which is accurate when the true 
state is close to low-rank. From this perspective, our 
methods can be seen as a sort of Occam's Razor, using 
as few fit parameters as possible while still agreeing with 
the data [59]. Furthermore, as we show here and else- 
where [17], informationally incomplete measurements can 
still provide faithful state reconstructions up to a small 
truncation error. 

One additional feature of our methods is that we are 
deeply concerned with the feasibility of our estimators 
for a moderately large number of qubits (say, 10-15). In 
contrast to most of the existing literature, we adopt the 
perspective that it is not enough for an estimator to be 
asymptotically efficient in the number of copies for fixed 
d. We also want the scaling with respect to d to be op- 
timal. We specifically take advantage of the fact that 
many states and processes are described by low rank 
objects to reduce this complexity. In this respect, our 
methods are similar to tomographic protocols that are 
tailored to special ansatz classes of states, such as those 
recently developed for use with permutation-invariant 
states [60], matrix product states [61] or multi-scale en- 
tangled states [62]. 

Our error bounds are somewhat unique as well. Most 
prior work on error bounds used either standard resam- 
pling techniques or Bayesian methods [42, 43, 45, 48, 50, 
52]. Very recently, Christandl & Renner and Blume- 
Kohout independently derived two closely related ap- 
proaches for obtaining confidence regions that satisfy or 
nearly satisfy certain optimality criteria [63, 64]. Espe- 
cially these latter approaches can give very tight error 
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bounds on an estimate, but they can be computation- 
ally challenging to implement for large systems. The 
error bounds which most closely resemble ours are of 
the "large deviation type"; see for example the discus- 
sion in Ref. [38]. This is true for the new improved 
error bounds, as well as the original bounds proven in 
Refs. [17, 18]. These types of bounds are much easier to 
calculate in practice, which agrees with our philosophy on 
computational complexity, but may be somewhat looser 
than the optimal error bounds obtainable through other 
more computationally intensive methods such as those of 
Refs. [63, 64]. 



B. Notation and Outline 

We denote Pauli operators by P or P^, We define 
[n] = {1, . . . , n}. The norms we use are the standard Eu- 
clidean vector norm ||x||2, the Frobenius norm ||V||^ = 
y/Tr(XfX), the operator norm ||X|| = ^X^JjOX) 
and the trace norm ||X|| tr = Tr|X|, where \X\ = VxTx. 
The unknown "true" state is denoted p and any estima- 
tors for p are given a hat: p. The expectation value of 
a random variable X is denoted MX. We denote by M d 
the set of d x d Hermitian matrices. 

The paper is organized as follows. In Section II we de- 
tail the estimators and error bounds, then upper bound 
the sample complexity. In Section III we derive lower 
bounds on the sample complexity. In Section IV we find 
an efficient method of certifying the state estimate. In 
Section V we detail our numerical investigations. We 
show how our scheme can be applied to quantum chan- 
nels in Section VI and conclude in Section VII. 



II. THEORY 

We describe our compressed tomography scheme in de- 
tail. First we describe the measurement procedure, and 
the method for reconstructing the density matrix. Then 
we prove error bounds and analyze the sample complex- 
ity. 



A. Random Pauli Measurements 

Consider a system of n qubits, and let d = 2". Let V 
be the set of all d 2 Pauli operators, i.e., matrices of the 
form P = ax® ■ ■ ■ ® a n where er^ e {I, a x ,a v ,a z }. 

Our tomography scheme works as follows. First, 
choose to Pauli operators, P±,...,P m , by sampling in- 
dependently and uniformly at random from V . (Al- 
ternatively, one can choose these Pauli operators ran- 
domly without replacement [65], but independent sam- 
pling greatly simplifies the analysis.) We will use t copies 
of the unknown quantum state p. For each i £ [m] , take 
t/m copies of the state p, measure the Pauli observable 



Pi on each one, and average the measurement outcomes 
to obtain an estimate of the expectation value Tr(P,p). 
(We will discuss how to set to and t later. Intuitively, 
to learn a d x d density matrix with rank r, we will set 
to ~ rd log 6 d and t ~ r 2 d 2 logd) 

To state this more concisely, we introduce some nota- 
tion. Define the sampling operator to be a linear map 
A : M d -> R m defined for all i € [to] by 

(Ap))i = \fI^P) ■ (1) 

The normalization is chosen so that EM* .4 = 1, where I 
denotes the identity superoperator and A* is the adjoint 
of A. We can then write the output of our measurement 
procedure as a vector 

y = A(p) + z, (2) 

where z represents statistical noise due to the finite num- 
ber of samples, or even due to an adversary. 



B. Reconstructing the Density Matrix 

We now show two methods for estimating the density 
matrix p. Both are based on the same intuition: find 
a matrix X 6 M d that fits the data y while minimizing 
the trace norm ||X|| tr , which serves as a surrogate for 
minimizing the rank of X. In both cases, this amounts 
to a convex program, which can be solved efficiently. 

(We mention that at this point we do not require that 
our density operators are normalized to have unit trace. 
We will return to this point later in Section V.) 

The first estimator is obtained by constrained trace- 
minimization (a.k.a. the matrix Dantzig selector): 

p DS = argmin||V|| tl . s.t. \\A*(A(X) - y)\\ < A. (3) 

.A 

The parameter A should be set according to the amount 
of noise in the data y; we will discuss this later. 

The second estimator is obtained by least-squares lin- 
ear regression with trace-norm regularization (a.k.a. the 
matrix Lasso): 

pLasso = argmini||„4(V) - y\\% + fJt\\X\\ tI . (4) 

Again the regularization parameter p should be set ac- 
cording to the noise level; we will discuss this later. 

One additional point is that we do not require posi- 
tivity of the output in our definition of the estimators 
(3), (4). One can add this constraint (since it is convex) 
and the conclusions below remain unaltered. We will 
explicitly add positivity later on when we do numerical 
simulations, and discuss any tradeoffs that result from 
this. 
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C. Error Bounds 

In previous work on compressed tomography [17, 18], 
error bounds were proved by constructing a "dual certifi- 
cate" [GG] (using convex duality to characterize the solu- 
tion to the trace-minimization convex program) . Here we 
derive stronger bounds using a different tool, known as 
the restricted isometry property (RIP). The RIP for low- 
rank matrices was first introduced in Ref. [32], and was 
recently shown to hold for random Pauli measurements 
[30]. 

We say that the sampling operator A satisfies the rank- 
r restricted isometry property if there exists some con- 
stant < 8 r < f such that, for all X £ £ dxd with rank r, 

(1 - <5 r )||X|| F < \\A(X)\\ 2 < (1 + <fr)||*||F ■ (5) 

For our purposes, we can assume that X is Hermitian. 
Note that this notion of RIP is analogous to the one used 
in Ref. [19], except that it holds over the set of low-rank 
matrices, rather than the set of sparse matrices. 

The random Pauli sampling operator (1) satisfies RIP 
with high probability, provided that m > Crd log 6 d (for 
some absolute constant C); this was recently shown in 
Ref. [30, Theorem 2.1]. We note, however, that this RIP 
result requires m to be larger by a poly log d factor com- 
pared to previous results based on dual certificates. Al- 
though to is slightly larger, the advantage is that when 
combined with the results of Ref. [31], this immediately 
implies strong error bounds for the matrix Dantzig selec- 
tor and the matrix Lasso. 

To state these improved error bounds precisely, we in- 
troduce some definitions. For the rest of Section II, let C, 
Co, Ci, C' Q and C[ be fixed absolute constants. For any 
quantum state p, we write p — p r + p c , where p r is the 
best rank-r approximation to p (consisting of the largest 
r eigenvalues and eigenvectors), and p c is the residual 
part. Now we have the following: 

Theorem 1. Let A be the random Pauli sampling opera- 
tor (1) with m > Crd log 6 d. Then, with high probability, 
the following holds: 

Let /5ds be the matrix Dantzig selector (3), and choose 
A so that ||.4*(z)|| < A. Then 

\\pT>S - P\\tr < C rX + Cl||p c ||tr • 

Alternatively, let pLasso be matrix Lasso (4), and 
choose p so that \\A*(z)\\ < A*/2. Then 

HpLasso - Plltr < C" Q rp + C[\\p c \\ t r ■ 

In these error bounds, the first term depends on the 
statistical noise z. This in turn depends on the number 
of copies of the state that are available in the experiment; 
we will discuss this in the next section. The second term 
is the rank-r approximation error. It is clearly optimal, 
up to a constant factor. 

Proof: These error bounds follow from the RIP as 
shown by Theorem 2.1 in Ref. [30], and a straightforward 



modification of Lemma 3.2 in Ref. [31] to bound the error 
in trace norm rather than Frobenius norm (this is similar 
to the proof of Theorem 5 in Ref. [67]). 

The modification of Lemma 3.2 in Ref. [31] is as fol- 
lows 1 . (For the remainder of this section, equation num- 
bers of the form (III.x) refer to Ref. [31].) In the case of 
the Dantzig selector, let H = pvs — p- Following equation 
(III. 8), we can get the following bound: 

||-ff||tr < H-Holltr + H-ffclk < 2 ll#0||tr + 2||p c || tr , (6) 

where we used the triangle inequality and equation 
(III. 8). Then, at the end of the proof, we write: 

||.ffo||tr < V^\\H \\ F < V^WHo + H 1 \\ F 

< dAV2r\ + Ci2V2<74r||/3 C ||tr, 

where we used Cauchy-Schwarz, the fact that Hq and 
Hi are orthogonal, the bound on ||JJo + Hi\\f following 
equation (III. 13), and equation (III. 7). Combining (6) 
and (7) gives our desired error bound. The error bound 
for the Lasso is obtained in a similar way; see section 
III.G in Ref. [31]. □ 

D. Sample Complexity 

Here we bound the sample complexity of our tomogra- 
phy scheme, that is, we bound the number of copies of the 
unknown quantum state p that arc needed to obtain our 
estimate up to some accuracy. What we show, roughly 
speaking, is that t = 0((^) 2 logd) copies are sufficient 
to reconstruct an estimate of an unknown rank r state up 
to accuracy e in the trace distance. For comparison, note 
that when r = d, and one does full tomography, 0(d 4 /e 2 ) 
copies are sufficient to estimate a full-rank state with ac- 
curacy e in trace distance 2 . 

To make this claim precise, we need to specify how 
we construct our data vector y from the measurement 
outcomes on the t copies of the state p. For the matrix 
Dantzig selector, suppose that 

t > 2C 4 (C r/e) 2 d{d+l)\ogd (8) 

for some constants C4 > 1 and e < Co- (For the matrix 
Lasso, substitute C' for Co in these equations.) We con- 
struct an estimate of A(p) as follows: for each is [777], we 



1 Note that Ref. [31] contains a typo in Lemma 3.2: on the right 
hand side, in the second term, it should be |[M c [|*/\A"i n °t 
l|Af c ||*/r. 

2 To see this, let Pi, ... , P d i denote the Pauli matrices. For each 
i S [d 2 ], measure Pi 0(d 2 /e 2 ) times, to estimate its expectation 
value with additive error ±0(e/d). Equivalcntly, one estimates 
the expectation value of Pi/Vd with additive error ±0(e/d 3 ' 2 ). 
Using linear inversion, one gets an estimated density matrix with 
additive error Ole/d 1 / 2 ) in Frobenius norm, which implies error 
O(e) in trace norm. 
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take t/m copies of p, measure the random Pauli observ- 
able Pi on each of the copies, and use this to estimate 
Tr(Pjp). Then let y be the resulting estimate of A(p), 
and let z = y — A(p). Everything else is defined exactly 
as in Theorem 1. 

Theorem 2. Given t = 0((^) 2 logd) copies of p as in 
Eq. (8) and measured as discussed above, then the fol- 
lowing holds with high probability over the measurement 
outcomes: 

Let pus be the matrix Dantzig selector (3), and set 
A = e/(Cor) for some e > 0. Then 

||PDS ~ P\\tr < e + Cl||/9 c || t r- 

Alternatively, let pLasso ^ e ^ e matrix Lasso (4), and 
set fx = e I (C' r) . Then 

IIPLasso - p\\tr < £ + Cl || PcH tr • 

Proof: Our claim reduces to the following question: 
if we fix some value of A > 0, how many copies of p are 
needed to ensure that the measurement data y satisfies 
\\A*(y - A(p))\\ < A? Then one can apply Theorem 1 to 
get an error bound for our estimate of p. 

Let t be the number of copies of p. Say we fix the mea- 
surement operator A, i.e., we fix the choice of the Pauli 
observables Pi, ... , P m . (The measurement outcomes are 
still random, however.) For i £ [m] and j £ [t/m], let 
Bij £ {1,-1} be the outcome of the j'th repetition of 
the experiment that measures the z'th Pauli observable 
Pi. Note that EB i3 - = Tr(Pip). Then construct the vec- 
tor y £ M. m conta ining the estimated expectation values 
(scaled by yjd/m): 



t/m 
3=1 



i £ m 



(9) 



Note that My=A(p). 

We will bound the deviation ||.4*(y — .A(/?))||, using the 
matrix Bernstein inequality. First we write 

m m t/m 

a'(v) = viE PiVi = ^ EE p ^ ( 10 ) 

i=i 1= i 



i=l 



and also 



(11) 



We can now write A* (y — A(p)) as a sum of independent 
(but not identical) matrix-valued random variables: 

m t/m 

A* (y - A(p)) = E E X v > X v = -t p i [ B H ~ Tr(Pp)] • 

(12) 



(13) 



Note that Ely = and \\X i:j \\ < 2d/t =: R. Also, for 
the second moment we have 

E(X§)=E(£j[B y -'&(P i p)] a ) 
= £l[l-Tr(P iP ) 2 ]. 

Then we have 

a 2 := ||^E(X 2 )|U^^[l-Tr(Pp) 2 ] 



(14) 



- 1 ' e- - t ■ 



Now the matrix Bernstein inequality (Theorem 1.4 in 
[68]) implies that 



Pr 



\\A*(y-A(p))\\ > A] <^cxp(- ^ /3) ) 
^ d ' ex p(-^£fT)) 



(15) 



(where we assumed A < 1). 

For the matrix Dantzig selector, we set A = e/(Cor), 
and we get that, for any t > 2C/±\~ 2 d(d + l)logci = 
2Ci(C r/e) 2 d(d+ l)logd, 



Pr[\\A*(y-A(p))\\ > ^] < d ■ ex P (-C 4 logd) 



(16) 



which is exponentially small in C4. Plugging into Theo- 
rem 1 completes the proof of our claim. A similar argu- 
ment works for the matrix Lasso. □ 



III. LOWER BOUNDS 

How good are the sample complexities of our algo- 
rithms? Here we go a long way toward answering this 
question by proving nearly tight lower bounds on the 
sample complexity for generic rank r quantum states us- 
ing single-copy Pauli measurements. Previous work on 
single-copy lower bounds has only treated the case of 
pure states [33]. 

Roughly speaking, we show the following, which we 
will make precise at the end of the section. 

Theorem 3 (Imprecise formulation). The number of 
copies t needs to grow at least as fast as f2(r 2 G? 2 /logd) to 
guarantee a constant trace-norm confidence interval for 
all rank-r states. 

The argument proceeds in three steps. First, we fix 
our notion of risk to be the minimax risk, meaning we 
seek to minimize our worst case error according to some 
error metric such as the trace distance. We want to know 
how many copies of the unknown state we need to make 
this minimax risk an arbitrarily small constant. For a 
fixed set of two-outcome measurements, say Pauli mea- 
surements, we then show that some states require many 
copies to achieve this. In particular, these states have 
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the property that they are globally distinguishable (their 
trace distance is bounded from below by a constant) but 
their (Pauli) measurement statistics look approximately 
the same (each measurement outcome is close to unbi- 
ased). The more such states there are, the more copies 
we need to distinguish between them all using solely Pauli 
measurements. Finally, we use a randomized argument 
to show that in fact there are exponentially many such 
states. This yields the desired lower bound on the sample 
complexity. 

Let E be some set of density operators. We want to 
put lower bounds on the performance of any estimation 
protocol for states in S. (We do not initially restrict 
ourselves to states with low rank.) 

We assume the protocol has access to t copies of an un- 
known state p £ E on which it performs measurements 
one by one. At the ith step, it has to decide which ob- 
servable to measure. Let us restrict ourselves to binary 
POVM measurements {IL, 1 — 11;}, where each 11; satis- 
fies < Hi < 1 and may be chosen from a set V . (We 
do not initially restrict ourselves to Pauli measurements, 
either.) We allow the choice of the zth observable to 
depend on the previous outcomes. We refer to the ran- 
dom variables which jointly describe the choice of the zth 
measurement and its random outcome as Y^. At the end, 
these are mapped to an estimate p{Y\, ■ ■ ■ , Y t ) G E. 

In other words, an estimation protocol is specified by 
a set of functions 

IT : Y 1 ,...,Y^ 1 ^V , 
p:Y u ...,Y t ^X. 

Consider a distance measure A : E x S 4 1 on the 
states in E. (For example, this could be the trace dis- 
tance or the infidelity; we need not specify.) Suppose that 
the maximum deviation we wish to tolerate between our 
unknown state and the estimate is given by e > 0. Now 
define the minimax risk 

M*(e) = inf su P Pr[A(p(y),p) > e], (17) 

(p.rii) p6E 

where the infimum is over all estimation procedures 
(p, IT) on t copies with estimator p and choice of observ- 
ables given by 11;. That is, we are considering the "best" 
protocol to be the one whose worst-case probability of 
deviation is the least. 

The next lemma shows that if there are a large number 
of states in E which are far apart (by at least e), and 
whose statistics look nearly random for all measurements 
in V, then the number of copies t must be large too to 
avoid having a large minimax risk. 

Lemma 4. Assume there are states px, . . . , p s C E such 
that 



Then the minimax risk as defined in (17) fulfills 

M*(e)>l-^±i 
logs 

Proof: Let A be a random variable taking values uni- 
formly in [s] . Let Y\ , . . . , Yt be random variables, Y; de- 
scribing the outcome of the ith measurement performed 
on px- Define 

X(Y) :=argminA(p(y),p, ; )- 

i 

Then 

Pr[A(p(y), Pi ) > e] > Pr[A(Y) ^ A] . (18) 

Now combine Fano's inequality 

H(X\X) < l + Pr[A ^ A] logs, 

the data processing inequality 

I(X;X(Y))<I(X;Y), 

in terms of the mutual information /(A; Y) := H(X) — 
H(X\Y), and the fact that H(X) = logs to get 



Pr[A(F) i= A] > 



H(X\X) - 1 
logs 

H(X) - I(X: A) - 1 



= 1 
> 1 
= 1 



logs 
7(A;A) + 1 

logs 
I(X;Y) + 1 

logs 

H(Y) - H(Y\X) + 1 
logs 



> 



1 t~ 1 -j:t =1 H(Y\X = i ) + l 
log s 



Vi,VIl e V : Itr^n- \\<a. 



Let h(p) be the binary entropy and recall the standard 
estimate 

h(l/2±a) > (1 -4a 2 ). 

Combine that with the chain rule [69, Theorem 2.5.1]: 
t 

H(Y\X =i)=Yl H ( Y i\ Y i-l> ■■-,Yi,X = i) 
>t(l-4a 2 ). 

The advertised bound follows. □ 

By applying the following lemma s < e crd times, we 
can randomly create a set of s states each with rank r 
that satisfy the conditions of Lemma 4 in terms of the 
trace distance and the set of Pauli measurements. 



Lemma 5. For any < e < 1 — 5, let pi,...,p s be 

normalized? rank-r projections on C d , where s < e c ^ rd 
and c(e) is specified below. Then there exists a normalized 
rank-r projection p such that: 

l 



Vi G [s] : |||p-Pt||tr > e. 
yP k ^l: |tr[i(l±P fe )p] -i| <a. 



(19) 
(20) 



Here, a 2 
and 



where we use ||A|| < 2 in the last line, which follows 
from the triangle inequality and the fact that any A can 
be written as a difference A = O' — O for O' £ SO(d). 

From these ingredients, we can invoke Levy's Lemma 
on the special orthogonal group [70, Theorem 6.5.1] to 
get that for all t > 0, 

c 2 t 2 rd^ 



O 



(^), the P k are n-qubit Pauli operators, Pr [H^ ~ P\\* < 2 ( X ~ r / d ) ~ A < ^ ex p(~^lT~) 



c(e) 



ln(8/7r) 
2rd 



1 

32 



Proof: Let p be some normalized rank-r projection and 
choose p according to 4 

p = O P0 O T 

for a Haar-random O E SO(d). Here we use the special 
orthogonal group SO(d) because the analysis becomes 
marginally simpler than if we use a unitary group. 

To check (19), choose i £ [s] and define Ri to be the 
projector onto the range of p\. Also define the function 

/:OH \\p t -Op O T \\ tr . 

We can bound the magnitude of / using the pinching 
inequality: 



f(0) >\\p l -R t O Po O T R l 



\RfOp O R 



>tr(p 4 ) - tT(R z O Po O T R. l ) + tv{RiOp a O T Ri) 
= l+tr[(i? i L - R l )Op O T ] . 

From this we can bound the expectation value of / over 
the special orthogonal group: 

E[/(0)] >l+tv[(Rl - Ri)(lSOpO T )] 
=l + hr[(Rt-R l )l] 



2r 



2 1 



Next we get an upper bound of ^ on the Lipschitz con- 
stant of / with respect to the Frobenius norm: 

|/(0 + A) -f(0)\ 

<\\(0 + A) Po (0 + Af - Op O T \\ tr 
<||Op A T ||tr + ||A Po O T ||tr + ||A Po A T || tr 
=2||Ap || tr +tr(A /O0 A T ) 
<2Vf||Ap ||F+tr(p AA T ) 

<2Vr||A|| F ||p || -' ''" x 11 v " 



A||||A J l|tr 



<^I|A||, 



■V^||A|| 



< 



IAI 



3 Normalized to have trace 1. 

4 For the duration of this proof, the letter O denotes an element 
of SO(d) instead of the asymptotic big-O notation. 



where the constants are given by c\ — In(y7r78) and 
C2 = 1/8. Now choose t = 2(1 — r/d) — 2e and apply the 
union bound to obtain 

Pr[(19) does not hold] 

<e c ^ rd Pr[\\ Pl - p\\ tI < 2(1 - r/d) - t] 



< exp ( rd 



c(e) 



C2*l 

16 



Cl 



The upper bound on e follows from the requirement that 
t > 0. This shows that p indeed satisfies Eq. (19). 

Now we move on to Eq. (20). For any non- identity 
Pauli matrix Pk, define a function 

/:Oh tv(P k O Po O T ) . 

Clearly, we have E[/(0)] = 0. We again wish to bound 
the rate of change so that we can use Levy's Lemma, so 
we compute 

(d G /)(A) =tr(p a O T P k A) +tr(P k Op A T ) 
=tr[(p O T P k +p^O T P k T )A] , 

which implies that 



||V/(O)|| 2 = ||p O J P fc +^O J P fc J ||f < 



Levy's Lemma then gives for all t > 



Pr[|trP fc p| > t] < e Cl exp 



C2t 2 rd 



Choosing t = 2a, and a 2 = 41n(d 4 7r/8)/(rrf), then the 
union bound gives us 



Pr[(20) does not hold] 

<d 2 Pr[|trP fc( o| > 2a] 

c 2 {2a) 2 rd 



ci = 1 , 



< exp ( 2 In d — 



from which the lemma follows. □ 

We remark that a version of Lemma 5 continues to 
hold even if we can adaptively choose from as many as 
2°( n ) additional measurements which are globally uni- 
tarily equivalent to Pauli measurements. 

Combining the two previous lemmas yields a precise 
formulation of Theorem 3. 
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Theorem 6 (Precise version of Theorem 3). Fix e € 
(0, 1 — 5) and S £ [0, 1). Then for our bound to allow for 
M*(e) < 5 we must have that the number of copies t of 
p grows like 



t = n 



\ogd 



where the implicit constant depends on S and e. 



IV. CERTIFYING THE STATE ESTIMATE 



Theorem 7. Given a state estimate p with rank( ( o) = r, 
the number of copies t of the state p required for esti- 
mating F(p, p) to within ±e with probability 1 — 8 using 
single-copy Pauli measurements satisfies 



t = O 



[d\og(r 2 /S) + r 2 /S] 



(22) 



Proof: The result uses the DFE protocol of Refs. [33, 
34], modified as mentioned above, where the states \<pj) 
are the eigenstates of p. 

Expand p in its eigenbasis as p — J2j=i W^jMjY 
Then we have 



Here we sketch how the technique of direct fidelity es- 
timation (DFE), introduced in Refs. [33, 34] for pure 
states, can be used to estimate the fidelity between a 
low rank estimate p and the true state p. The only as- 
sumption we make is that p is a positive semidefinite 
matrix with Tr(/5) < 1 and rank(/5) = r. No assumption 
at all is needed on p. In fact, we also do not assume that 
we obtained the estimate p from any of the estimators 
which were discussed previously. Our certification pro- 
cedure works regardless of how one obtains p, and so it 
even applies to the situation where p was chosen from a 
subset of variational ansatz states, as in [61]. 

Recall that the main idea of DFE is to take a known 
pure state |V , )('0l and from it define a probability distri- 
bution Pr(i) such that by estimating the Pauli expecta- 
tion values Tr(pPi) and suitably averaging it over Pr(i) 
we can learn an estimate of (ip\p\?p). In fact, one does 
not need to do a full average; simply sampling from the 
distribution a few times is sufficient to produce a good 
estimate. 

For non-Hermitian rank-1 matrices \ipj)((f)k\ instead of 
pure states, we need a very slight modification of DFE. 
Following the notation in Ref . [33] , we simply redefine the 
probability distribution as Pr(i) = |(</>j \Pi\cf>k}\ 2 /d and 
the random variable X = Tr(Pjp) / ((f>k\Pi\(f>j) . It is easy 
to check that E(JT) = (<fij\p\<j)k) and the variance of X is 
at most one. Then all of the conclusions in Refs. [33, 34] 
hold for estimating the quantity (4>j\p\(f>k). In particular, 
we can obtain an estimate to within an error ±e with 
probability 1 — S by using only single-copy Pauli mea- 
surements and 0(jjg + dl °g( 2 1 /' 5 ) ) copies of p. 

Our next result shows that by obtaining several such 
estimates using DFE, we can also infer an estimate of the 
mixed state fidelity between an unknown state p and a 
low rank estimate p, given by 



F(p,p)= [TtVG]'- 



(21) 



where for brevity we define G ~ y/ppy/p. (Note that 
some authors use the square root of this quantity as the 
fidelity. Our definition matches Ref. [33].) The asymp- 
totic cost in sample complexity is far less than the cost 
of initially obtaining the estimate p when r is sufficiently 
small compared to d. 



g= E VWk{<i>M<i>k)\<t>j){<t>k\ 



(23) 



For all 1 < j < k < r, we use direct fidelity esti- 
mation to obtain an estimate jjjk of the matrix element 
(4>j\p\<f>k), up to some additive error ejk that is bounded 
by a constant, le^l < eo- 

If each estimate is accurate with probability 1 — 
2S/(r 2 + r), then by the union bound the probability 
that they are all accurate is at least 1 — 5. The total 
number of copies t required for this is 



t = O 



[d\og(r 2 /S) + r 2 /S] 



Let g jk = sL , and let 



G= ^ V X 3 X k9jk\<l>]) 



(24) 



(25) 



be our estimate for G. Finally, let G + = [G]+ be the 
positive part of the Hermitian matrix G, and let F — 
[Tr\/ G+] 2 be our estimate of the fidelity F(p,p). Note 

that we may assume that F < 1, since if it were larger, 
we can only improve our estimate by just truncating it 
back to 1. 

We now bound the error of this fidelity estimate. We 
can write G as a perturbation of G, G = G + E, where 



(26) 



First notice that the Frobenius norm of this perturbation 
is small, 



\E\ 



2jAjAfc|ejfc|" 



1/2 



(27) 



(If p is subnormalized, then this last equality becomes an 
inequality.) 

Next observe that 



\F-F\ = 



/ G+)Tr(v / G 



< 2A, 
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where we define 



(28) 



Using the reverse triangle inequality, we can bound A in 
terms of the trace norm 



A < 



(29) 



Using [71, Thm. X.1.3] in the first step, we find that 



G-G+l 



< yW G- G H 



where the second bound follows from the Cauchy- 
Schwarz inequality on the vector of eigenvalues of \G — 
G + \. Using the Jordan decomposition of a Hermitian 
matrix into a difference of positive matrices X = [X] + — 
[X] _ , we can rewrite G — G + as 



G-G+ = G-[G + E\. 



\G + E]_ 



(30) 



Then by the triangle inequality and positivity of G, 

||G - 6+ || to < ||£|| tr + ||[G + £]_|| tr < 2||£|| tr . (31) 

Using the standard estimate ||-E||tr < V^II-^II-F, we find 

A<r 3 / 4 v^o. (32) 

This gives our desired error bound, albeit in terms of eo 
instead of the final quantity e. To get our total error to 
vanish, we take e = 2r 3 / 4 v / 2e , which gives us the final 
scaling in the sample complexity. □ 

As a final remark, we note that by computing A for 
the special case p = p = t/r, e = egl/r, we find that 



A = VI + re - 1 , 



(33) 



so the error bound for this protocol is tight with respect 
to the scaling in eo (and hence e). However, we cannot 
rule out that there are other protocols that achieve a 
better scaling. Also, it seems that the upper bound for 
the current protocol with respect to r could potentially 
be improved by a factor of r with a more careful analysis. 



V. NUMERICAL SIMULATIONS 

We numerically simulated the reconstruction of a 
quantum state given Pauli measurements and the esti- 
mators from Eqs. (3) and (4). Here we compare these 
estimates to those obtained from a standard maximum 
likelihood estimate (MLE) as a benchmark. 

As mentioned previously, all of our estimators have the 
advantage that they are convex functions with convex 
constraints, so the powerful methods of convex program- 
ming [72] guarantee that the exact solution can be found 
efficiently and certified. We take full advantage of this 
certifiability and do simulations which can be certified by 



interior-point algorithms. This way we can separate out 
the performance of the estimators themselves from the 
(sometimes heuristic) methods used to compute them on 
very large scale instances. 

For our estimators, we will explicitly enforce the posi- 
tivity condition. That is, we will simulate the following 
slight modifications of Eqs. (3) and (4) given by 



^ s = argminTrA s.t. \\A*(A(X) 

A ^0 



y)ll<A, (34) 



and 



= axgmm ±\\A(X) - yf 2 + pTrX . 



(35) 



Moreover, whenever the trace of the resulting estimate is 
is less than one, we renormalize the state, p <-\ p/Tr(p). 



A. Setting the Estimator Parameters A and p 

From Thm. 1 we know roughly how we should choose 
the free parameters A and jj,, modulo the unknown con- 
stants Gj, for which we will have to make an empirical 
guess. We guess that the log factor is an artifact of the 
analysis, and that the state is nearly pure. Then we 
could choose A ~ p ~ d/y/i. For the matrix Dantzig 
selector of Eq. (34), we specifically chose A = 2>d/y/t for 
our simulations. However, this did not lead to very good 
performance for the matrix Lasso of Eq. (35) when m 
was large. We chose instead p — 4m/y/i, which agrees 
with A when m ~ d, as it can be for nearly pure states. 

We stress that very little effort was made to optimize 
these choices for the parameters. We simply picked a few 
integer values for the constants in front (namely 2, . . . , 5), 
and chose the constant that worked best upon a casual 
inspection of a few data points. We leave open the prob- 
lem of determining what the optimal choices are for A 
and p. 



B. Time Needed to Switch Measurement Settings 

Real measurements take time. In a laboratory setting, 
time is a precious commodity for a possibly non-obvious 
reason. An underlying assumption in the way we typi- 
cally formulate tomography is that the unknown state p 
can be prepared identically many times. However, the 
parameters governing the experiment often drift over a 
certain timescale, and this means that beyond this char- 
acteristic time, it is no longer appropriate to describe the 
measured ensemble by the symmetric product state p®*. 

To account for this characteristic timescale, we intro- 
duce the following simple model. We assume that the en- 
tire experiment takes place in a fixed amount of time T. 
In a real experiment, this is the timescale beyond which 
we cannot confidently prepare the same state p, or it is 
the amount of time it takes for the student in the lab to 
graduate, or for the grant funding to run out, whichever 
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comes first. Within this time limit, we can either change 
the measurement settings, or we can take more samples. 
We assume that there is a unit cost to taking a single 
measurement, but that switching to a different measure- 
ment configuration takes an amount of time c. 

At the one extreme, the switching cost c to change 
measurement settings could be quite large, so that it is 
only barely possible to conduct enough independent mea- 
surements that tomography is possible within the allotted 
time T. In this case, we expect that compressed tomogra- 
phy will outperform standard methods, since these meth- 
ods have not been optimized for the case of incomplete 
data. At the other extreme, the switching cost c could be 
set to (or some other small value), in which case we are 
only accounting for the cost of sampling. Here it is not 
clear which method is better, for the following reason. 
Although the standard methods are able to take a suf- 
ficient amount of data in this case, each observable will 
attain comparatively little precision relative to the fewer 
observables measured with higher precision in the case 
of compressed tomography for the same fixed amount of 
time T. One of our goals is to investigate if there are any 
tradeoffs in this simple model. 

For the relative cost c between switching measurement 
settings and taking one sample, we use c = 20, a number 
inspired by the current ion trap experiments done by the 
Blatt group in Innsbruck. However, we note that the 
conclusions don't seem to be very sensitive to this choice, 
as long as we don't do something outrageous like c > 
t, which would preclude measuring more than even one 
observable. 



C. Other Simulation Parameters 

We consider the following ensembles of quantum states. 
We initially choose n = 5 qubit states from the invariant 
Haar measure on C 2 . Then we add noise to the state by 
applying independent and identical depolarizing channels 
to each of the n qubits. Recall that the depolarizing 
channel with strength 7 acts on a single qubit as 

P»=7~ + (1-7)P, (36) 

that is, with probability 7 the qubit is replaced by a 
maximally mixed state and otherwise it is left alone. Our 
simulations assume very weak decoherence, and we use 
7 = 0.01. 

We use two measures to quantify the quality of a state 
reconstruction p relative to the underlying true state p, 
namely the (squared) fidelity HvT^/Plltr an ^ the trace 
distance — /5|| tr - 

We used the interior-point solver SeDuMi [26] to do the 
optimizations in Eqs. (34) and (35). Although these algo- 
rithms are not suitable for large numbers of qubits, they 
produce very accurate solutions, which allows a more re- 
liable comparison between the different estimators. The 
data and the code used to generate the data for these 



plots can be found with the source code for the arXiv ver- 
sion of this paper. For larger numbers of qubits, one may 
instead use specialized algorithms such as SVT, TFOCS, 
or FPCA to solve these convex programs [27-29]; how- 
ever, the quality of the solutions depends somewhat on 
the algorithm, and it is an open problem to explore this 
in more detail. 

For the MLE, we used the iterative algorithm of 
Ref. [73] , which has previously been used in e.g. Ref. [74] . 
This method converged on every example we tried, so 
we did not have to use the more sophisticated "diluted" 
method of Ref. [75] that guarantees convergence. 

For the purposes of our simulations, we sampled our 
Pauli operators without replacement. 



D. Results and Analysis 

The data are depicted in Fig. 1. The three subfigures 
a-c use three different values for the total sampling time 
T in increasing order. We have plotted both the fidelity 
and the trace distance (inset) between the recovered so- 
lution and the true state. 

Several features are immediately apparent. First, we 
see that both of the compressed sensing estimators con- 
sistently outperform MLE along a wide range of values of 
m, the number of Paulis sampled, even when we choose 
the optimal value of m just for the MLE. Once m is suf- 
ficiently large (but still m -c d 2 ) all of the estimators 
converge to a reasonably high fidelity estimate. Thus, 
it is not just the compressed sensing estimators which 
are capable of reconstructing nearly low rank states with 
few measurement settings, but rather this seems to be a 
generic feature of many estimators. However, the com- 
pressed sensing estimators seem to be particularly well- 
suited for the task. 

When the total amount of time T is very small 
(Fig. la), then there is a large advantage to using com- 
pressed sensing. In this regime, there is barely enough 
time to make one measurement per setting if we insist 
on making an informationally complete measurement, so 
the measurement statistics aren't even Gaussian. Thus, 
the fluctuations are so large that trying to fit a density 
operator to these data just leads to poor results because 
you wind up fitting only noise. While the advantage of 
compressed sensing is clear in this regime, it is not ap- 
plicable when we are interested in extremely high-fidelity 
state reconstructions (namely greater than 95%). 

As the total amount of time available increases, all of 
the estimators of course converge to higher fidelity esti- 
mates. Interestingly, for the compressed sensing estima- 
tors, there seems to be no tradeoff whatsoever between 
the number of measurement settings and the fidelity once 
T and m are sufficiently large. That is, the curve is com- 
pletely flat as a function of m above a certain cutoff. This 
is most notable for the matrix Lasso, which consistently 
performs at least as well as the matrix Dantzig selector, 
and often better. These observations are consistent with 



12 



Fidelity and Trace Distance T=41000, c=20 



b) 



c) 





| 0.4 

.S 0-3 
°0.2 
1 0.1 
0. 




i-H HHH 



128 



256 384 512 640 768 
Number of Sampled Paulis (m) 



896 1024 



Fidelity and Trace Distance T=80000, c=20 




256 384 512 640 768 
Number of Sampled Paulis (m) 

Fidelity and Trace Distance T=270000, c=20 



1024 




128 



256 384 512 640 768 
Number of Sampled Paulis (m) 



896 1024 



FIG. 1: Fidelity and trace distance (inset) as a function of 
m, the number of sampled Paulis. Plots a), b) and c) are 
for a total measurement time of T = 41k, 80k, and 270k 
respectively, in units where the cost to measure a single copy is 
one unit of time, and the cost to switch measurement settings 
is c = 20 units. The three estimators plotted are the matrix 
Dantzig selector (Eq. (34), blue), the matrix Lasso (Eq. (35), 
red), and a standard MLE (green). Each bar shows the mean 
and standard deviation from 120 Haar-random 5-qubit states 
with 1% local depolarizing noise. Our estimators consistently 
outperform MLE, even after optimizing the MLE over m. See 
the main text for more details. 



the bounds proven earlier. 

The fiat curve as a function of m is especially interest- 
ing, because it suggests that there is no real drawback to 
using small values of m. Smaller values of m are attrac- 
tive from the computational point of view because the 
runtime of each reconstruction algorithm scales with m. 
This makes a strong case for adopting these estimators 
in the future, and at the very least more numerical stud- 
ies are needed to investigate how well these estimators 
perform more generally. 

We draw the following conclusion from these simula- 
tions. The best performance for a fixed value of T is given 
by the matrix Lasso estimator of Eqs. (4) and (35) in the 
regime where m is nearly as small as possible. Here the 
fidelity is larger than the other estimators (if only by a 
small or negligible amount when T is large), but using a 
small value for m means smaller memory and processing 
time requirements when doing the state reconstruction. 
MLE consistently underperforms the compressed sensing 
estimators, but still seems to "see" the low-rank nature 
of the underlying state and converges to a reasonable es- 
timate even when m is small. 



VI. PROCESS TOMOGRAPHY 

Compressed sensing techniques can also be applied to 
quantum process tomography. Here, our method has an 
advantage when the unknown quantum process has small 
Kraus rank, meaning it can be expressed using only a 
few Kraus operators. This occurs, for example, when 
the unknown process consists of unitary evolution (act- 
ing globally on the entire system) combined with local 
noise (acting on each qubit individually, or more gener- 
ally, acting on small subsets of the qubits). 

Consider a system of n qubits, with Hilbert space di- 
mension d = 2". Let £ be a completely positive (CP) 
map from C dxd to C dxd , and suppose that £ has Kraus 
rank r. Using compressed sensing, we will show how 
to characterize £ using rn = 0(rd 2 \ogd) settings. (For 
comparison, standard process tomography requires d A 
settings, since £ contains d independent parameters in 
general.) Furthermore, our compressed sensing method 
requires only the ability to prepare product eigenstates 
of Pauli operators and perform Pauli measurements, and 
it does not require any ancilla qubits. 

We remark that, except for the ancilla-assisted 
method, just the notion of "measurement settings" for 
process tomography does not capture all of the complex- 
ity because of the need to have an input to the chan- 
nel. Here we define one "input setting" to be a basis 
of states from which the channel input should be sam- 
pled uniformly. Then the total number of settings m is 
the sum of the number of measurement settings (Paulis, 
in our case) and input settings. This definition justifies 
the claim that the number of settings for our compressed 
process tomography scheme is m = 0(rd 2 logd). 

The analysis here focuses entirely on the number of 
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FIG. 2: Compressed quantum process tomography using an 
ancilla. The quantum circuit represents a single measurement 
setting, where one measures the observable Pa on the system 
and Pb on the ancilla. 



settings m. We forgo a detailed analysis of t, the sample 
complexity, and instead leave this open for future work. 

Note that there is a related set of techniques for esti- 
mating an unknown process that is elementwise sparse 
with respect to some known, experimentally accessible 
basis [19]. These techniques are not directly compara- 
ble to ours, since they assume a greater amount of prior 
knowledge about the process, and they use measurements 
that depend on this prior knowledge. We will discuss this 
in more detail at the conclusion of this section. 



A. Our Method 

First, recall the Jamiolkowski isomorphism [76]: the 
process £ is completely and uniquely characterized by 
the state 

ps = (8 03X)(|^o><^o|), 

where \ipo) = 4^ £y=i \j) ® Note that when £ has 
Kraus rank r, the state ps has rank r. This immedi- 
ately gives us a way to do compressed quantum process 
tomography: first prepare the Jamiolkowski state ps (by 
adjoining an ancilla, preparing the maximally entangled 
state \ipo), an d applying £); then do compressed quantum 
state tomography on ps; see Figure 2. 

We now show a more direct implementation of com- 
pressed quantum process tomography that is equivalent 
to the above procedure, but does not require an ancilla. 
Observe that in the above procedure, we need to estimate 
expectation values of the form 

Tr((P A <g> P B )p £ ) = Tr((P A <g> P B )(£ ®I)(|Vo)ftM)), 

where Pa and Pb are Pauli matrices. By using the Kraus 
decomposition, it is straightforward to derive the equiv- 
alent expression 

Tr((P A ® P B )p £ ) = -Tt{P a S{Pb)), (37) 
a 

where the bar denotes complex conjugation in the stan- 
dard basis. 



13 



\MPb)) 





£ 




Pa 







FIG. 3: Compressed quantum process tomography, imple- 
mented directly without an ancilla. Here one prepares a ran- 
dom eigenstate of Pb, applies the process £ , and measures 
the observable Pa on the output. 

We now show how to estimate the expectation value 
(37). Let Xj and \cf>j) denote the eigenvalues and eigen- 
vectors of P B - Then we have 

1 d 

Tr((P4 ® P B )p £ ) = jJ2 A,Tr(P^(|^)(^|)). 

3=1 

To estimate this quantity, we repeat the following experi- 
ment many times, and average the results: choose j € [d] 
uniformly at random, prepare the state \4>j), apply the 
process £, measure the observable Pa, and multiply the 
measurement result by Xj. (See Figure 3.) In this way, 
we learn the expectation values of the Jamiolkowski state 
ps without using an ancilla. We then use compressed 
quantum state tomography to learn ps , and from this we 
recover £. 



B. Related Work 

Our method is somewhat different from the method 
described in Ref. [19]. Essentially the difference is that 
our method works for any quantum process with small 
Kraus rank, whereas the method of Shabani et al. works 
for a quantum process that is elementwise sparse in a 
known basis (provided this basis is experimentally ac- 
cessible in a certain sense). The main advantage of the 
Shabani et al. method is that it can be much faster: for 
a quantum process £ that is s-sparse (i.e., has s nonzero 
matrix elements), it requires only 0{s log d) settings. The 
main disadvantage is that it requires more prior knowl- 
edge about £ , and is more difficult to apply. While it has 
been demonstrated in a number of scenarios, there does 
not seem to be a general recipe for designing measure- 
ments that are both experimentally feasible and effective 
in the sparse basis of £ . 

To clarify these issues, we now briefly review the Sha- 
bani et al. method. We assume that we know a basis 
T = {r Q | a £ [d 2 ]} in which the process £ is s-sparse. 
For example, when £ is close to some unitary evolution 
U, one can construct T using the SVD basis of U. This 
guarantees that, if £ contains no noise, it will be perfectly 
sparse in the basis T. However, in practice, £ will con- 
tain noise, which need not be sparse in the basis T; any 
non-sparse components will not in general be estimated 
accurately. The success of the Shabani et al. method 
therefore rests on the assumption that the noise is also 
sparse in the basis T. Although this assumption has been 
verified in a few specific scenarios, it seems less clear why 
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it should hold in general. By comparison, our method 
simply assumes that the noise is described by a process 
matrix that is low rank; this can be rigorously justified 
for any noise process that involves only local interactions 
or few-body processes. 

The other complication with the Shabani et al. method 
concerns the design of the state preparations and mea- 
surements. On one hand, these must satisfy the RIP 
condition for s-sparse matrices over the basis T; on the 
other hand, they must be easy to implement experimen- 
tally. This has been demonstrated in some cases, by using 
states and measurements that are "random" enough to 
show concentration of measure behaviors, but also have 
tensor product structure. However, these constructions 
are not guaranteed to work in general for an arbitrary 
basis T. 

We leave open the problem of doing a comparative 
study between these and other methods [77, 78], akin 
to Ref. [79]. 

VII. CONCLUSION 

In this work, we have introduced two new estimators 
for compressed tomography: the matrix Dantzig selec- 
tor and the matrix Lasso (Eqs. (3) and (4).) We have 
proved that the sample complexity for obtaining an es- 
timate that is accurate to within e in the trace distance 
scales like O(^Hr-logd) for rank-r states, and that for 
higher rank states, the additional error is proportional 
to the truncation error. This error scaling is optimal up 
to constant multiplicative factors, and requires measur- 
ing only 0(rd poly log d) Pauli expectation values, a fact 
we proved using the RIP [30]. We also proved that our 
sample complexity upper bound is within poly log d of 
the sample complexity of the optimal minimax estima- 
tor, where the risk function is given by a trace norm con- 
fidence interval. We showed how a modification of direct 
fidelity estimation can be used to unconditionally certify 
the estimate using a number of measurements which is 
asymptotically negligible compared to obtaining the orig- 
inal estimate. We numerically simulated our estimators 
and found that they outperform MLE, giving higher fi- 
delity estimates from the same amount of data. Finally, 
we generalized our method to quantum process tomog- 
raphy using only Pauli measurements and preparation of 
product eigenstates of Pauli operators. 

There are many interesting open questions that re- 
main. On the theoretical side, one open problem is of 
course to tighten the gap that remains between the sam- 
ple complexity upper and lower bounds. Another open 
problem is to try to prove optimality with respect to al- 
ternative criteria other than minimax risk. For example, 



it would be interesting to find a useful notion of average 
case optimality. 

One major open problem is to switch focus from two- 
outcome Pauli measurements to alternative measure- 
ments which are still experimentally feasible. For exam- 
ple, measurements in a local basis have 2 n outcomes and 
are not directly analyzable using our techniques. It would 
be very interesting to give an analysis of our estimators 
from the perspective of such local basis measurements. 
One difficulty, however, is that something like the RIP is 
not likely to hold in this case, so we will need additional 
techniques. 

On the numerical side, some of the open questions are 
the following. First, it would be very interesting to do 
a more detailed numerical study of the performance of 
our estimators. While they have clearly outperformed 
MLE in the simulations reported here, there is no ques- 
tion that this is a narrow range of parameters on which 
we have tested these estimators. It would be interesting 
to do additional comparative studies between these and 
other estimators to see how robust these performance 
enhancements are. It would also be very interesting to 
study fast first-order solvers such as Refs. [27-29] which 
could compute estimates on a large number of qubits (10 
or more). 

The success of the estimators we studied depends on 
being able to find good values for the parameters A and 
\x. While we have used simple heuristics to pick partic- 
ular values, a detailed study of the optimal values for 
these parameters could only improve the quality of our 
estimators. Moreover, MLE seems to enjoy the same 
"plateau" phenomenon, where the quality of the estimate 
is insensitive to m above a certain cutoff. This leads us 
to speculate that this is a generic phenomenon among 
many estimators, and that perhaps there are even better 
choices for estimators than the ones we benchmark here. 
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